Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Conditional Random Field (CRF) layer #377

Closed

Conversation

howl-anderson
Copy link
Contributor

@howl-anderson howl-anderson commented Jul 30, 2019

related to: #314 #22
depend on: #314

What's include

  • TensorFlow Keras layer for CRF
  • Loss function for CRF

How to use it

The document is coming soon.

Progress

[x] Layer class for CRF
[x] Test cases

Limitation

  • Currently, only support right padding mask (due to the underline CRF functions), CRF layer will detect and report to user if mask contains left padding.

Roadmap (next version)

  • Include constraints: an optional list of allowed transitions (from_tag_id, to_tag_id). Learned from AllenNLP

@googlebot
Copy link

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@howl-anderson
Copy link
Contributor Author

@gabrieldemarmiesse I am working on using the add_loss way to implement the CRF layer. When it's done, I will update this PR.

@gabrieldemarmiesse
Copy link
Member

Thanks a lot @howl-anderson I hope it helped.

@albertnanda
Copy link

Any update here?

@howl-anderson
Copy link
Contributor Author

@albertnanda I encountered some problems during the implementation and are trying to solve them.

@gabrieldemarmiesse
Copy link
Member

I moved some code around in this PR, and I made #1363 which has less hacks. @howl-anderson , @Squadrick, please take a look when you can.

@howl-anderson , I've put you as author of the first commit.

@bodak
Copy link

bodak commented Mar 23, 2020

I moved some code around in this PR, and I made #1363 which has less hacks. @howl-anderson , @Squadrick, please take a look when you can.

@howl-anderson , I've put you as author of the first commit.

While _keras_history approach in this PR is unfortunate (but necessary to access the layer at loss), it has no impact on the end API user.

The add_loss() approach unless I missed something won't work due to not having access to y_true inside the layer.

@gabrieldemarmiesse Your approach seems to have introduced hacks that lead to multiple impacts on the API user: two models to track and fake targets to input. From an end user perspective, I would be hesitant to leave this to be handled externally.

@gabrieldemarmiesse
Copy link
Member

@bodak we can't use private APIs in code used by users.

If at the next release, this attribute disapears, we get an attribute error and we have to redo all the work of this pull request to find another pattern that works for CRF.

Or worst case scenario, the behavior changes silently (for example tensorflow decide to enforce that we can't use tensors or variables not declared in the signature in the loss), and all users get wrong numerical results/horrible error messages and we spend a lot of time understanding what is going on. To in the end fall back to #1363 because it's the only way that is garanteed to last since we don't use private APIs.

I agree with you that it leads to more confusing code for end users, but they'll have more flexibility and stability. For example, in the implementation proposed in #1363, the user is not forced to have the CRF layer be the last layer of the model.

If users don't like the pattern of having two models or passing the labels as inputs, they're free to use only one model and do a custom training loop. Making a custom training loop is getting easier and easier nowadays.

@mhartig
Copy link

mhartig commented Mar 23, 2020

Hi, huge thanks to you for making this effort and building a CRF layer for tensorflow 2. For my application I require the probabilities or confidences for the class predictions in order to conduct a majority vote. As far as I understand, if I use the variable self.potentials that is used in the function get_viterbi_decoding as additional output, this will yield the probabilities that come from the dense layer, so they are not necessarily the "true" probabilities for the CRF class prediction. Is there a simple way to get the predicted class together with the probability?
If I change the code from return decoded_sequence to return decoded_sequence, self.potentials I get a problem with the ranks of the expected outputs. Before my "solution" was to load another Model like so:
dense_layer = self.model.get_layer('crf')._dense_layer proba_predictor = tf.keras.Model(self.model.inputs, dense_layer.output) y_proba = proba_predictor.predict(x) y_pred_probas = softmax(y_proba).numpy().max(axis=-1)

but as expected this doubled the prediction time.
I would be very thankful for a solution.

@howl-anderson
Copy link
Contributor Author

Hi @mhartig, output the probabilities or confidences for the prediction is a good feature. After the CRF layer is merged, I can implement it.

@mhartig
Copy link

mhartig commented Mar 24, 2020

I have tried to modify the output of the CRF layer myself, unfortunately I'm not very familiar with the underlying structure of tf.keras and the context in which the class methods are called. The output shape seems to be restricted in more than one place, but it is very hard to see exactly where the expected output shape needs to be changed in order to return for example a tensor of shape (batch_size, sequence length, number_of_classes + 1) and dtype float32 instead of a tensor of shape (batch_size, sequence_length) and dtype int32.
If it doesn't take too much time for you, it would be awesome if you could share a list of places where the output settings need to be modified. I could do the manipulations within the layer by myself.

@boring-cyborg boring-cyborg bot added the github label Mar 25, 2020
@howl-anderson
Copy link
Contributor Author

@mhartig Since this CRF layer implement is not merged, so I am not planning to implement this feature now. But if I get enough free time, I willing to help you with this.

@HuggingLLM
Copy link

Any update?

@howl-anderson
Copy link
Contributor Author

@NLP-ZY waiting for tensorflow/tensorflow#37818

@soumayan
Copy link

any example of how to implement this crf layer?

@howl-anderson
Copy link
Contributor Author

@soumayan Currently, it's waiting for other PR. Doc will be added to the repo when it's ready.

@soumayan
Copy link

@howl-anderson when I'm installing tensorflow_addons package from your github repository using

!pip install git+https://github.com/howl-anderson/addons.git

I'm getting error--
ERROR: Failed building wheel for tensorflow-addons
Failed to build tensorflow-addons
ERROR: Could not build wheels for tensorflow-addons which use PEP 517 and cannot be installed directly

@howl-anderson
Copy link
Contributor Author

@soumayan You should download it and built it to a wheel package. https://github.com/tensorflow/addons has docs for how to build from source.

@zhoushaoxiang
Copy link

zhoushaoxiang commented Apr 26, 2020

excuse me,how to get accuray while training?I can‘t find crf_accuracy.py @howl-anderson

@howl-anderson
Copy link
Contributor Author

@zhoushaoxiang I removed it for more focus on CRF itself. I am planning to add metric functions (just like the ACC function ) after the CRF implement merged.

@gabrieldemarmiesse
Copy link
Member

@howl-anderson , I'll close this pull request in favor of #1733 which has an API as clean as this one, is flexible and can be serialized, the whole package without using any private api.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.